Climate Farmer Coding Challenge - Environmental Modeller & Researcher
I decided to work with Portugal, because it includes various types of land cover, and has variability in soil organic carbon and climate. I imported the country shape from the gadm database GADM database. As I want to work with mainland Portugal, I excluded Madeira and the Azores.
Shape of mainland Portugal used to restrict the analysis of soil, land cover and climate data.
Temperature (°C) in Portugal for January 2020.
Evapotranspiration (m) in Portugal for January 2020.
Precipitation (m) in Portugal for January 2020.
Land cover classes in Portugal in 2020.
The label and color information was extracted from the metadata of the land cover layer. However, I decided to change the color for the three cropland rainfed classes slightly, so they could be distinguished better.
## SpatRaster resampled to ncells = 500760
Soil organic carbon content (t/ha) in Portugal.
| Layer Name | Resolution (lon) | Resolution (lat) | Origin (lon) | Origin (lat) |
|---|---|---|---|---|
| Climate variables | 0.1 | 0.1 | -9.58 | 36.97 |
| Land Cover | 0.0028 | 0.0028 | -9.5472 | 36.9611 |
| Soil Organic Carbon | 0.0023 | 0.0024 | -9.5468 | 36.9609 |
The climate variables (which share the same spatial dimensions) are at a lower resolution than the land cover and SOC layer and also have different origins. Therefore I resampled the latter to the same resolution, origin and extent as the climate variables.
Land cover is a categorical variable. I explored two options to resample this data. First using the method “nearest neighbor”, which is typically used for categorical variables, as it assigns new pixel values by selecting the nearest original pixel value without any interpolation, effectively copying the closest value to the new pixel location. However, when resampling to a lower resolution for a land cover layer, it might be more interesting using the class which most high resolution pixels have within the lower resolution. I tested both options using the code below and compared the maps visually in terms of pattern.
# resample option 1 using nearest neighbor
landcover_pt_near <- terra::resample(landcover_pt, resample_raster, method = "near")
# resample option 2 using majority
landcover_pt_majority <-
exactextractr::exact_resample(landcover_pt,
resample_raster,
'majority')
Land cover class patterns in Portugal in 2020 resampled to climate data with two different methods.
When resampling with method “near” the landcover classes are very patchy and fragmented. The layer resampled with majority has more continuous representation of classes and the logic of that resampling method is more sound, so I am using that layer for the analysis.
Soil organic carbon (SOC) is a continous variable. I want to work
with the mean SOC per lower resolution cell. Typically continuous
rasters are resampled using method bilinear, which calculates values of
a grid location based on nearby grid cells, using a weighted average of
the four nearest cell centers. I tested this option, as well as using a
function to calculate the mean within the exact_resample()
function. This function aggregates cells before resampling, so that the
average is not based on four grid-cells but the grid cells covered by
the lower resolution cell. I compared the maps visually in terms of
pattern.
# resample option 1
SOC_pt_bil <- resample(landcover_pt, resample_raster, method = "bilinear")
# resample option 2
SOC_pt_mean <- exactextractr::exact_resample(SOC_pt,
resample_raster,
'mean')
Soil organic carbon layer in Portugal resampled with two different methods.
When resampling with method ´exact_resample´ and function “mean” the pattern of low SOC values along the cost and high values in the North of Portugal is maintained, therefore I am going to use that layer for the analysis.
I checked whether the dimensions for all layers matched before proceeding with the analysis using the ´compareGeom´ function.
compareGeom(evapotransp_pt, precipitation_pt, temperature_pt,
landcover_pt, SOC_pt)
I analysed climate and soil organic carbon within the land cover classes in Portugal and over time. First I explore the share of each land cover class within the country.
Land cover classes of Portugal and their proportions. All classes with bars below the dashed line were excluded for further analysis, as well as water and NA.
For the next steps I excluded the land cover classes with less than 1% of overall pixels (i.e. 10 pixels or less) represented with the dashed line in the plot, as well as pixels that had land cover “water” or “NA”, which were 2.5% and 1% respectively of all pixels.
I examined climate variables across various land cover classes across a time span. Depending on the specific message required, the plots can emphasize different aspects. Initially, I analyzed the changes in mean values across the months spanning from 2020 to 2022, encompassing all land cover classes collectively. This approach offers a clear visualization to highlight variations in behavior among different land cover classes.
Mean temperature in different land cover classes in Portugal from 2020 to 2022.
Mean precipitation in different land cover classes in Portugal from 2020 to 2022.
Mean evapotranspiration in different land cover classes in Portugal from 2020 to 2022.
Depending on what the focus of the visualization should be, I could also look at each land cover class separately. This visualization is better to compare overall differences in the individual patterns and allows to plot the standard deviation as errors around each line, which isn’t very visible in the combined plot. I am showing temperature here as an example.
Mean temperature in different land cover classes in Portugal from 2020 to 2022.
I can visualize mean soil organic carbon per landcover class. As I only have one time point for this layer I used a bar plot for visualisation.
Average soil organic carbon (t/ha) per land cover classes of Portugal.
Error bars show the standard deviation around the mean per class.
I calculated the necessary sample size following:
\[ n = \left(\frac{z \times \sigma}{E}\right)^2 \]
to detect changes in SOC if I want a 95% confidence interval equal to or less than 10% of the mean value, assuming a Gaussian distribution.The formula calculates the required sample size (n) needed to estimate a population mean within a desired margin of error (E) at a specified confidence level. It considers the variability of the population (σ) and the critical value from the standard normal distribution (z).
The sample size can thus be calculated with the following code:
# calculate variables based on input and SOC
confidence_level <- 0.95
z_score <- qnorm(1 - (1 - confidence_level) / 2)
standard_deviation <- sd(sampling_env$SOC_df$SOC)
desired_width <- 0.1 * mean(sampling_env$SOC_df$SOC)
# Calculate the number of samples required
nr_samples <- (z_score * standard_deviation / desired_width) ^ 2
Working with the landcover at a ~1°̇ resolution, the necessary sample size is thus 28. This calculated value represents the minimum sample size needed for the analysis.
Working with the landcover at the original 250 m resolution and thus a higher variability in the data, the necessary sample size would be 40.
If we wanted to sample these points in space we could use the
spatSample() function to suggest random coordinates within
Portugal.
Random sampling scheme for the soil organic carbon content (t/ha) in Portugal for the lower resolution of the soil layer.
We can also design a sampling scheme for soil organic carbon content that is stratified for land cover classes. In this case we calculate the necessary sample size for the SOC values of all pixels with that landcover class.
Stratified sampling scheme for the soil organic carbon content (t/ha) within land cover classes in Portugal.
I implemented a very simple soil model using the package ‘soilR’ and RothC. The setup assumes that the only information available are the percent clay content in the topsoil, which I extracted for a point within Portugal from the SoilGrids database, an assumed annual amount of litter inputs, and monthly averages of climatic variables for that same point. The model is run 300 years into the future.
Output of simple soil model using RothC.
The final pool sizes of Dissolved and Particulate Matter (DPM), Resistant Particulate Matter (RPM), Biomass (BIO), Humus (HUM), and Inert Organic Matter (IOM) for this point in Portugal with assumed parameters are then:
| DPM | RPM | BIO | HUM | IOM |
|---|---|---|---|---|
| 0.1477609 | 2.1369862 | 0.2786526 | 11.4037173 | 5.4357393 |
This simple model could be further expanded and tested for other areas.
The End